USHIFT-6951: Add metrics exporters as optional MicroShift components by copejon · Pull Request #6763 · openshift/microshift

copejon · 2026-05-29T00:32:11Z

Deliver metrics-server, kube-state-metrics, and node-exporter as optional MicroShift components packaged in a single microshift-metrics RPM. When installed alongside microshift-observability, the otel-collector automatically scrapes all three exporters via drop-in config.

What's included

microshift-metrics RPM — kustomize manifests for metrics-server, kube-state-metrics, and node-exporter with SCC RBAC, service-ca TLS, and health check registration
metrics-server kubelet client cert — dedicated cert under the kube-apiserver-to-kubelet-signer provisioned as a Secret, avoiding hostPath SELinux issues
Auto-rebase integration — CMO clone + manifest copy recipe in handle_assets.py, plus update_metrics_images() in rebase.sh to populate arch-specific overlays and release JSON from the OCP payload
Otel-collector drop-in — microshift-metrics.yaml scrape configs for all three exporters using Kubernetes endpoints SD with service-CA TLS

Made with Cursor

Summary by CodeRabbit

Release Notes

New Features
- Added optional metrics-server component for Kubernetes cluster metrics
- Added kube-state-metrics for monitoring Kubernetes resource states
- Added node-exporter for host and node-level metrics
- Integrated OpenTelemetry Collector for metrics collection and export
Chores
- Added RPM packages for metrics components (x86_64 and aarch64)
- Added RHEL 10 support in development environment tools
- Updated deployment manifests and Kubernetes RBAC configurations for metrics workloads

openshift-ci-robot · 2026-05-29T00:32:15Z

@copejon: This pull request references USHIFT-6951 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Deliver metrics-server, kube-state-metrics, and node-exporter as optional MicroShift components packaged in a single microshift-metrics RPM. When installed alongside microshift-observability, the otel-collector automatically scrapes all three exporters via drop-in config.

What's included

microshift-metrics RPM — kustomize manifests for metrics-server, kube-state-metrics, and node-exporter with SCC RBAC, service-ca TLS, and health check registration

metrics-server kubelet client cert — dedicated cert under the kube-apiserver-to-kubelet-signer provisioned as a Secret, avoiding hostPath SELinux issues

Auto-rebase integration — CMO clone + manifest copy recipe in handle_assets.py, plus update_metrics_images() in rebase.sh to populate arch-specific overlays and release JSON from the OCP payload

Otel-collector drop-in — microshift-metrics.yaml scrape configs for all three exporters using Kubernetes endpoints SD with service-CA TLS

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-29T00:32:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 240fac84-8424-42e1-9105-110e8f97cbb4

📥 Commits

Reviewing files that changed from the base of the PR and between e53c5d4 and 765807b.

📒 Files selected for processing (4)

packaging/observability/otelcol.d/microshift-metrics-kube-state-metrics.yaml
packaging/observability/otelcol.d/microshift-metrics-node-exporter.yaml
packaging/observability/otelcol.d/microshift-metrics-server.yaml
packaging/rpm/microshift.spec

Walkthrough

This PR integrates three optional metrics collection components (metrics-server, kube-state-metrics, node-exporter) into MicroShift via Kubernetes manifests, RPM packaging, and automated build/deployment infrastructure. The implementation includes certificate provisioning, multi-architecture image overrides, and OpenTelemetry collector integration.

Changes

Metrics Components

Layer / File(s)	Summary
Metrics component manifests and kustomization `assets/optional/metrics-server/`, `assets/optional/kube-state-metrics/`, `assets/optional/node-exporter/*`	Kubernetes manifests define three metrics exporters with service accounts, RBAC roles/bindings, deployments/daemonsets, services, and supporting ConfigMaps. Kustomization files provide base overlays and per-architecture (aarch64/x86_64) image remappings with SHA256 digests.
Certificate provisioning and TLS material `assets/components/metrics-server/*`, `pkg/cmd/init.go`, `pkg/cmd/metrics.go`, `pkg/cmd/run.go`, `pkg/util/cryptomaterial/certinfo.go`	Metrics-server kubelet client certificates are provisioned at startup; CSR configuration in init.go defines the `system:metrics-server` identity, `provisionMetricsServerCerts()` polls for the namespace and applies TLS secrets/ConfigMaps, and a background goroutine in RunMicroshift coordinates provisioning.
RPM subpackage definitions and installation `packaging/rpm/microshift.spec`	Four new RPM subpackages package the three metrics components and release metadata; `%install` creates OTEL drop-in directories, installs per-arch kustomization manifests and release JSON, and OTEL collector drop-in YAMLs marked `%config(noreplace)`.
Build and rebase automation for metrics manifests `scripts/auto-rebase/assets_metrics.yaml`, `scripts/auto-rebase/rebase.sh`	Auto-rebase script clones `cluster-monitoring-operator` and copies metrics assets via `handle_assets.py`, regenerates per-arch release metadata JSON, and rewrites kustomization image overrides from OCP release digests.
Runtime integration and observability `packaging/observability/microshift-observability.service`, `packaging/observability/otelcol.d/*`, `pkg/healthcheck/microshift_optional_workloads.go`	Systemd service now loads per-component OTel drop-in YAMLs; three new drop-in configs define Prometheus receivers for metrics-server, kube-state-metrics, and node-exporter. Health checks recognize the new optional workloads and merge namespace expectations across multiple manifest paths.
Development and test infrastructure `scripts/devenv-builder/*`, `test/bin/common.sh`	VM creation scripts detect RHEL 10+ and enable UEFI boot; ISO filename helper recognizes RHEL 10 versions; test allowlist adds metrics RPM packages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

pacevedom
ggiguash

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

Ignore

❌ Failed checks (1 error, 2 warnings)

Check name	Status	Explanation	Resolution
Container-Privileges	❌ Error	Node-exporter has justified privileged: true and hostPID/hostNetwork for monitoring, but kube-state-metrics containers lack allowPrivilegeEscalation: false explicitly set, defaulting to unsafe.	Add allowPrivilegeEscalation: false to all kube-state-metrics containers. Node-exporter privileges are acceptable given its system monitoring requirements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Topology-Aware Scheduling Compatibility	⚠️ Warning	metrics-server has toleration for master nodes, and node-exporter has wildcard operator:Exists toleration. Both risk scheduling to arbiter nodes on TNA topology.	Exclude arbiter nodes via nodeAffinity or label selectors. Replace node-exporter's operator:Exists with specific tolerations that exclude node-role.kubernetes.io/arbiter:NoSchedule.

✅ Passed checks (12 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main changeset: adding three metrics exporters (metrics-server, kube-state-metrics, node-exporter) as optional MicroShift components.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR adds no Ginkgo tests. Test files in this codebase use Go's standard testing package (testing.T), not Ginkgo, so test name stability check is not applicable.
Test Structure And Quality	✅ Passed	No Ginkgo test files (*_test.go) were added or modified in this PR. The custom check for Ginkgo test structure is not applicable.
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added in this PR. The PR only modifies regular source code files and configuration manifests for adding optional metrics exporters.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No Ginkgo e2e tests (It/Describe/Context/When) were added in this PR. Changes are infrastructure manifests, Go source code, packaging, and scripts—not e2e tests.
Ote Binary Stdout Contract	✅ Passed	This PR modifies operational MicroShift service code (metrics components) and manifests, not OTE/test extension binaries. The OTE stdout contract check does not apply.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR adds no Ginkgo e2e tests—only Kubernetes manifests, Go functions, scripts, and RPM specs. IPv6/disconnected network check not applicable.
No-Weak-Crypto	✅ Passed	No weak cryptography found. All TLS uses strong ciphers (ECDHE+AES-GCM/ChaCha20-Poly1305), standard Go crypto libraries, no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, and no unsafe comparisons.
No-Sensitive-Data-In-Logs	✅ Passed	No sensitive data logging found. All logging statements log only non-sensitive items: file paths, namespace names, component names, generic error messages, and manifest paths.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-29T00:32:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: copejon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [copejon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 17

♻️ Duplicate comments (1)

assets/optional/metrics-server/release-metrics-aarch64.json (1)
3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify placeholder replacement for release base and image digests.

All placeholder values ("base": "placeholder" and sha256:placeholder digests) must be replaced before deployment. Confirm that update_metrics_images() populates these values from the OCP payload during the build process.

Also applies to: 6-8
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/metrics-server/release-metrics-aarch64.json` at line 3, The
JSON contains placeholder values ("base": "placeholder" and image digests like
sha256:placeholder) that must be populated from the OCP payload; update the
update_metrics_images() function so it reads the OCP payload during the build,
replaces the "base": "placeholder" field and any sha256:placeholder digests in
release-metrics-aarch64.json (and the other affected files) with the actual
values, and fail the build with a clear error if any placeholders remain after
processing to prevent deployment with placeholders.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@assets/optional/kube-state-metrics/02-clusterrole.yaml`:
- Around line 6-22: The ClusterRole currently grants cluster-wide list/watch on
"secrets" in the resources block; remove "secrets" from that resources list in
the ClusterRole manifest (or move it into a separate opt-in ClusterRole/Role
bound only where secret metrics are explicitly enabled) so the
kube-state-metrics workload no longer has cluster-wide secret read access;
update any docs/values that toggle secret metrics to clearly opt-in if you
reintroduce a separate role.

In `@assets/optional/kube-state-metrics/04-deployment.yaml`:
- Around line 106-111: The manifest pins kube-state-metrics to control-plane
nodes via the nodeSelector (kubernetes.io/os: linux +
node-role.kubernetes.io/master) and a matching toleration; remove the
nodeSelector and the tolerations block from the Deployment so the workload is
not forced onto master/control-plane nodes (or if a softer preference is
required, replace with a preferredDuringSchedulingIgnoredDuringExecution
nodeAffinity targeting non-control-plane nodes instead of a hard
nodeSelector/toleration).
- Around line 38-41: The securityContext blocks (securityContext in the
container spec) are missing explicit capability drops; update each container's
securityContext to include capabilities.drop: ["ALL"] and then add back only any
strictly required capabilities via capabilities.add if needed (i.e., modify the
existing securityContext entries where allowPrivilegeEscalation,
readOnlyRootFilesystem, runAsNonRoot are set); apply the same change to the
other container securityContext occurrences in this file so all containers
explicitly drop all capabilities and only re-add minimal required ones.
- Around line 25-105: The pod is missing liveness/readiness probes and resource
limits for all three containers (kube-state-metrics, kube-rbac-proxy-main,
kube-rbac-proxy-self); for each container add a resources.limits stanza (e.g.
kube-state-metrics limits cpu/memory higher than requests; kube-rbac-proxy-main
and kube-rbac-proxy-self likewise) and add readinessProbe and livenessProbe
entries: for kube-state-metrics use an HTTP GET to port 8081 path /metrics (or
/healthz if preferred) and for kube-rbac-proxy-main/self use HTTPS HTTPGet
probes pointing to their secure-listen ports (8443 and 9443) with appropriate
scheme: HTTPS and path /readyz or /healthz, plus reasonable
initialDelaySeconds/periodSeconds/timeoutSeconds values; place these probes and
limits under the corresponding container blocks (identified by name:
kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self).

In `@assets/optional/metrics-server/00-namespace.yaml`:
- Around line 5-9: The namespace-level pod security labels
pod-security.kubernetes.io/enforce, pod-security.kubernetes.io/audit, and
pod-security.kubernetes.io/warn are set to privileged and must be removed from
the namespace manifest to avoid opt-out of baseline/restricted controls; delete
those three labels from the namespace YAML and instead apply the exception only
to the specific workload by adding the equivalent labels/annotations on the
Deployment/DaemonSet/Pod manifest(s) for the metrics server or bind a targeted
ServiceAccount to a custom/restricted SCC (or cluster-scoped
SecurityContextConstraints) so only that workload runs with elevated privileges
rather than the entire openshift-monitoring namespace.

In `@assets/optional/metrics-server/05-deployment.yaml`:
- Around line 91-96: The manifest currently pins the pod to control-plane nodes
via the nodeSelector and tolerations entries (nodeSelector, tolerations, key:
node-role.kubernetes.io/master); remove these fields so the Deployment no longer
hard-requires scheduling on master/control-plane nodes, or replace them with a
non-mandatory soft constraint (e.g.,
preferredDuringSchedulingIgnoredDuringExecution style affinity) if you need a
preference—ensure no direct reference to node-role.kubernetes.io/master or
control-plane remains in the Deployment spec.
- Around line 57-60: The container spec currently only sets resource requests
(resources.requests.cpu and resources.requests.memory); add corresponding
resource limits by adding a resources.limits block with both cpu and memory
(e.g., limits.cpu and limits.memory) for the same container entry so the pod
cannot overconsume under pressure—update the resources section in the Deployment
container spec (the resources field under the container definition in this YAML)
to include limits for cpu and memory matching your expected upper bounds.
- Around line 61-63: The manifest's securityContext currently sets
allowPrivilegeEscalation and readOnlyRootFilesystem but omits runAsNonRoot and
capability drops; update the securityContext block (the same object containing
allowPrivilegeEscalation and readOnlyRootFilesystem) to set runAsNonRoot: true
and add capabilities: drop: ["ALL"] (and only add back any specific capabilities
if the container truly needs them), ensuring the hardening is enforced in the
deployment spec.

In `@assets/optional/node-exporter/02-clusterrole.yaml`:
- Around line 16-22: The ClusterRole currently grants access to the built-in
privileged SCC by listing resourceNames: - privileged for
securitycontextconstraints; replace this by creating and referencing a custom,
minimal SCC (e.g., node-exporter-scc) that only allows the specific host
namespaces and capabilities the DaemonSet requires, then update the
ClusterRole/RoleBinding to reference resourceNames: - node-exporter-scc (or the
custom SCC name) instead of privileged; ensure the ServiceAccount used by
node-exporter is the subject of the binding and that the custom SCC limits
capabilities, hostPID, hostNetwork, hostPath and any other host access to the
minimum necessary.

In `@assets/optional/node-exporter/04-daemonset.yaml`:
- Around line 25-26: The DaemonSet currently sets hostPID: true which
unnecessarily broadens the pod's privileges; remove the hostPID: true entry from
the Pod spec (spec.template.spec.hostPID) so the pod does not join the host PID
namespace unless a collector explicitly requires it, and verify the exporter
still functions using /host/root and /host/sys mounts; if a collector truly
requires host PID, add a short inline comment next to hostPID and a
justification in the PR describing why the exception is necessary and document
which collector needs it.
- Around line 27-96: Add resource limits and liveness/readiness probes for both
containers: for container "node-exporter" (name node-exporter) add
resources.limits for cpu and memory (e.g., match requests ratio like cpu: 50m,
memory: 64Mi) and add a readinessProbe and livenessProbe performing an HTTP GET
to path /metrics on port 9100 (scheme HTTP) with sensible timing
(initialDelaySeconds, periodSeconds, timeoutSeconds, failureThreshold); for
container "kube-rbac-proxy" (name kube-rbac-proxy) add resources.limits (e.g.,
cpu: 50m, memory: 64Mi) and readinessProbe/livenessProbe using HTTP GET to path
/metrics on port 9101 but with scheme HTTPS (and tls settings honored) and
similar timing values; ensure these probe blocks are added under each container
and adjust values to match cluster SLOs.
- Around line 58-84: Update the kube-rbac-proxy container's securityContext to
harden the sidecar: add allowPrivilegeEscalation: false, set runAsNonRoot: true
and an explicit non-root runAsUser (choose a non-zero UID), and drop all Linux
capabilities (capabilities.drop: ["ALL"] and only add back any minimal
capability if proven necessary); keep readOnlyRootFilesystem: true and do not
change other mounts. Target the container named "kube-rbac-proxy" in the
DaemonSet spec when applying these changes.

In `@assets/optional/node-exporter/kustomization.aarch64.yaml`:
- Around line 1-7: Replace the placeholder digest values for the two images in
kustomization.aarch64.yaml: update the digest fields under the image entries for
"quay.io/openshift/node-exporter" and "quay.io/openshift/kube-rbac-proxy" with
the actual sha256 digests that correspond to the referenced newName images
(registry.redhat.io/openshift4/ose-prometheus-node-exporter-rhel9 and
registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9); obtain the exact
digests from your image registry (e.g., skopeo inspect or the registry API/CI
output) and commit those real sha256:... values in place of "sha256:placeholder"
so the aarch64 overlay pins to real images.

In `@packaging/observability/otelcol.d/microshift-metrics.yaml`:
- Around line 10-27: The HTTPS scrapes discovered via
kubernetes_sd_configs.role: endpoints will use pod IPs for TLS ServerName and
fail hostname verification; update the TLS config for the
prometheus/kube_state_metrics scrape (job_name: kube-state-metrics) and likewise
for prometheus/metrics_server to include tls_config.server_name set to the
service FQDN (for example kube-state-metrics.openshift-monitoring.svc) so
certificate hostname verification uses the service DNS name instead of the pod
IP; add tls_config.server_name alongside the existing ca_file entry under
prometheus/kube_state_metrics (and the equivalent block for
prometheus/metrics_server).

In `@pkg/healthcheck/microshift_optional_workloads.go`:
- Around line 42-53: The three manifest entries for openshift-monitoring are
clobbering each other because fillOptionalMicroShiftWorkloads() keeps a single
NamespaceWorkloads per namespace; update the initialization so entries that
target the same Namespace ("openshift-monitoring") are merged rather than
overwritten: when building the map of NamespaceWorkloads (type
NamespaceWorkloads) combine Deployments and DaemonSets arrays (e.g., merge
"metrics-server" and "kube-state-metrics" into Deployments and "node-exporter"
into DaemonSets) before assigning to the namespace key in
fillOptionalMicroShiftWorkloads(), ensuring duplicates are avoided.

In `@scripts/auto-rebase/rebase.sh`:
- Around line 1136-1159: The METRICS_JSON_MAP associative array currently misses
the kube-rbac-proxy entry, so update METRICS_JSON_MAP (declared as
METRICS_JSON_MAP=( ... )) to include a mapping for the kube-rbac-proxy image
(e.g., add ["kube_rbac_proxy"]="kube-rbac-proxy" or a consistent key name used
elsewhere), so the for-loop that iterates json_key/release_tag/new_image will
pick up and write kube-rbac-proxy into the generated
release-metrics-${arch}.json files the same way as
metrics_server/kube_state_metrics/node_exporter.

In `@scripts/devenv-builder/create-vm.sh`:
- Around line 58-63: The RHEL major detection in create-vm.sh uses grep -P with
a pattern that can fail under set -euo when VMNAME contains just "-10" (no dot),
causing the script to exit and BOOT_OPTS to never be set; change the logic to
robustly derive RHEL_MAJOR (prefer MICROSHIFT_RHEL_VERSION if present, otherwise
extract digits from VMNAME) and avoid grep failing on no-match (e.g., use grep
-oE '\d+' with a fallback like "|| true" or use pure shell parameter expansion
to strip the suffix and parse digits), then compute RHEL_MAJOR=${RHEL_MAJOR%%.*}
or cut -d. -f1 and keep the existing BOOT_OPTS assignment using that value.

---

Duplicate comments:
In `@assets/optional/metrics-server/release-metrics-aarch64.json`:
- Line 3: The JSON contains placeholder values ("base": "placeholder" and image
digests like sha256:placeholder) that must be populated from the OCP payload;
update the update_metrics_images() function so it reads the OCP payload during
the build, replaces the "base": "placeholder" field and any sha256:placeholder
digests in release-metrics-aarch64.json (and the other affected files) with the
actual values, and fail the build with a clear error if any placeholders remain
after processing to prevent deployment with placeholders.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6e479cdc-b516-4f90-bdff-ac9801aa36ee

📥 Commits

Reviewing files that changed from the base of the PR and between a0c1df6 and 25ea1dc.

📒 Files selected for processing (45)

assets/components/metrics-server/kubelet-ca-configmap.yaml
assets/components/metrics-server/kubelet-client-secret.yaml
assets/optional/kube-state-metrics/01-serviceaccount.yaml
assets/optional/kube-state-metrics/02-clusterrole.yaml
assets/optional/kube-state-metrics/03-clusterrolebinding.yaml
assets/optional/kube-state-metrics/04-deployment.yaml
assets/optional/kube-state-metrics/05-service.yaml
assets/optional/kube-state-metrics/kustomization.aarch64.yaml
assets/optional/kube-state-metrics/kustomization.x86_64.yaml
assets/optional/kube-state-metrics/kustomization.yaml
assets/optional/metrics-server/00-namespace.yaml
assets/optional/metrics-server/01-serviceaccount.yaml
assets/optional/metrics-server/02-clusterrole.yaml
assets/optional/metrics-server/03-clusterrolebinding.yaml
assets/optional/metrics-server/04-rolebinding.yaml
assets/optional/metrics-server/05-deployment.yaml
assets/optional/metrics-server/06-service.yaml
assets/optional/metrics-server/07-apiservice.yaml
assets/optional/metrics-server/kustomization.aarch64.yaml
assets/optional/metrics-server/kustomization.x86_64.yaml
assets/optional/metrics-server/kustomization.yaml
assets/optional/metrics-server/release-metrics-aarch64.json
assets/optional/metrics-server/release-metrics-x86_64.json
assets/optional/node-exporter/01-serviceaccount.yaml
assets/optional/node-exporter/02-clusterrole.yaml
assets/optional/node-exporter/03-clusterrolebinding.yaml
assets/optional/node-exporter/04-daemonset.yaml
assets/optional/node-exporter/05-service.yaml
assets/optional/node-exporter/kustomization.aarch64.yaml
assets/optional/node-exporter/kustomization.x86_64.yaml
assets/optional/node-exporter/kustomization.yaml
packaging/observability/microshift-observability.service
packaging/observability/otelcol.d/microshift-metrics.yaml
packaging/rpm/microshift.spec
pkg/cmd/init.go
pkg/cmd/metrics.go
pkg/cmd/run.go
pkg/healthcheck/microshift_optional_workloads.go
pkg/util/cryptomaterial/certinfo.go
scripts/auto-rebase/assets_metrics.yaml
scripts/auto-rebase/rebase.sh
scripts/devenv-builder/config/kickstart.ks.template
scripts/devenv-builder/create-vm.sh
scripts/devenv-builder/manage-vm.sh
test/bin/common.sh

coderabbitai · 2026-05-29T00:45:09Z

+      containers:
+        - name: kube-state-metrics
+          image: quay.io/openshift/kube-state-metrics:latest
+          imagePullPolicy: IfNotPresent
+          args:
+            - --host=127.0.0.1
+            - --port=8081
+            - --telemetry-host=127.0.0.1
+            - --telemetry-port=8082
+          resources:
+            requests:
+              cpu: 10m
+              memory: 64Mi
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            runAsNonRoot: true
+        - name: kube-rbac-proxy-main
+          image: quay.io/openshift/kube-rbac-proxy:latest
+          imagePullPolicy: IfNotPresent
+          args:
+            - --secure-listen-address=:8443
+            - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
+            - --tls-min-version=VersionTLS12
+            - --upstream=http://127.0.0.1:8081/
+            - --tls-cert-file=/etc/tls/private/tls.crt
+            - --tls-private-key-file=/etc/tls/private/tls.key
+          ports:
+            - containerPort: 8443
+              name: https-main
+              protocol: TCP
+          resources:
+            requests:
+              cpu: 10m
+              memory: 40Mi
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            runAsNonRoot: true
+          volumeMounts:
+            - name: metrics-tls
+              mountPath: /etc/tls/private
+              readOnly: true
+            - name: tmp
+              mountPath: /tmp
+        - name: kube-rbac-proxy-self
+          image: quay.io/openshift/kube-rbac-proxy:latest
+          imagePullPolicy: IfNotPresent
+          args:
+            - --secure-listen-address=:9443
+            - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
+            - --tls-min-version=VersionTLS12
+            - --upstream=http://127.0.0.1:8082/
+            - --tls-cert-file=/etc/tls/private/tls.crt
+            - --tls-private-key-file=/etc/tls/private/tls.key
+          ports:
+            - containerPort: 9443
+              name: https-self
+              protocol: TCP
+          resources:
+            requests:
+              cpu: 10m
+              memory: 40Mi
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            runAsNonRoot: true
+          volumeMounts:
+            - name: metrics-tls
+              mountPath: /etc/tls/private
+              readOnly: true
+            - name: tmp-self
+              mountPath: /tmp
+      volumes:
+        - name: metrics-tls
+          secret:
+            secretName: kube-state-metrics-tls
+        - name: tmp
+          emptyDir: {}
+        - name: tmp-self
+          emptyDir: {}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add probes and limits for all three containers.

This pod defines requests only, and none of the containers have liveness/readiness probes. That leaves the rollout without health gating and the workload without hard resource ceilings.

As per coding guidelines, "Resource limits (cpu, memory) on every container" and "Liveness + readiness probes defined".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/kube-state-metrics/04-deployment.yaml` around lines 25 - 105, The pod is missing liveness/readiness probes and resource limits for all three containers (kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self); for each container add a resources.limits stanza (e.g. kube-state-metrics limits cpu/memory higher than requests; kube-rbac-proxy-main and kube-rbac-proxy-self likewise) and add readinessProbe and livenessProbe entries: for kube-state-metrics use an HTTP GET to port 8081 path /metrics (or /healthz if preferred) and for kube-rbac-proxy-main/self use HTTPS HTTPGet probes pointing to their secure-listen ports (8443 and 9443) with appropriate scheme: HTTPS and path /readyz or /healthz, plus reasonable initialDelaySeconds/periodSeconds/timeoutSeconds values; place these probes and limits under the corresponding container blocks (identified by name: kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self).

coderabbitai · 2026-05-29T00:45:09Z

+  labels:
+    name: openshift-monitoring
+    pod-security.kubernetes.io/enforce: privileged
+    pod-security.kubernetes.io/audit: privileged
+    pod-security.kubernetes.io/warn: privileged


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid namespace-wide privileged pod security labels.

Setting enforce, audit, and warn to privileged makes the whole namespace opt out of baseline/restricted controls. Scope the exception to the specific workload/SCC instead of weakening every pod in openshift-monitoring.

As per coding guidelines, **/*.{yaml,yml}: OpenShift: SCC must be restricted or custom-scoped.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/metrics-server/00-namespace.yaml` around lines 5 - 9, The namespace-level pod security labels pod-security.kubernetes.io/enforce, pod-security.kubernetes.io/audit, and pod-security.kubernetes.io/warn are set to privileged and must be removed from the namespace manifest to avoid opt-out of baseline/restricted controls; delete those three labels from the namespace YAML and instead apply the exception only to the specific workload by adding the equivalent labels/annotations on the Deployment/DaemonSet/Pod manifest(s) for the metrics server or bind a targeted ServiceAccount to a custom/restricted SCC (or cluster-scoped SecurityContextConstraints) so only that workload runs with elevated privileges rather than the entire openshift-monitoring namespace.

coderabbitai · 2026-05-29T00:45:09Z

+images:
+  - name: quay.io/openshift/node-exporter
+    newName: registry.redhat.io/openshift4/ose-prometheus-node-exporter-rhel9
+    digest: sha256:placeholder
+  - name: quay.io/openshift/kube-rbac-proxy
+    newName: registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9
+    digest: sha256:placeholder


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Replace the placeholder digests before merge.

sha256:placeholder makes the aarch64 overlay unusable and breaks image pinning for that architecture. If automation is expected to rewrite this, the generated digests need to be committed here, like the x86_64 overlay already does.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/node-exporter/kustomization.aarch64.yaml` around lines 1 - 7, Replace the placeholder digest values for the two images in kustomization.aarch64.yaml: update the digest fields under the image entries for "quay.io/openshift/node-exporter" and "quay.io/openshift/kube-rbac-proxy" with the actual sha256 digests that correspond to the referenced newName images (registry.redhat.io/openshift4/ose-prometheus-node-exporter-rhel9 and registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9); obtain the exact digests from your image registry (e.g., skopeo inspect or the registry API/CI output) and commit those real sha256:... values in place of "sha256:placeholder" so the aarch64 overlay pins to real images.

coderabbitai · 2026-05-29T00:45:09Z

+  prometheus/kube_state_metrics:
+    config:
+      scrape_configs:
+        - job_name: kube-state-metrics
+          scrape_interval: 30s
+          scheme: https
+          tls_config:
+            ca_file: /var/lib/microshift/certs/service-ca/ca.crt
+          kubernetes_sd_configs:
+            - kubeconfig_file: /var/lib/microshift/resources/observability-client/kubeconfig
+              role: endpoints
+              namespaces:
+                names: [openshift-monitoring]
+          relabel_configs:
+            - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
+              action: keep
+              regex: kube-state-metrics;https-main
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail file="packaging/observability/otelcol.d/microshift-metrics.yaml" # Show relevant sections around the referenced line ranges echo "=== File: $file (lines 1-120) ===" nl -ba "$file" | sed -n '1,120p' echo "=== Search for tls_config/server_name and __address__ relabeling in the file ===" rg -n "server_name|tls_config:|__address__|relabel_configs" "$file" || true echo "=== Show more context around kube_state_metrics and metrics_server if present ===" rg -n "kube-state-metrics|metrics-server" -n "$file" || true

Repository: openshift/microshift

Length of output: 211

🏁 Script executed:

#!/bin/bash set -euo pipefail file="packaging/observability/otelcol.d/microshift-metrics.yaml" echo "File exists? $(test -f "$file" && echo yes || echo no)" echo "=== First 200 lines (numbered) of $file ===" cat -n "$file" | sed -n '1,200p' echo "=== Search for tls_config/server_name and address relabeling ===" rg -n "server_name|tls_config:|__address__|relabel_configs" "$file" || true echo "=== Lines matching kube-state-metrics or metrics-server ===" rg -n "kube-state-metrics|metrics-server" "$file" -n || true echo "=== Show around lines 1-120 (non-numbered) for context ===" sed -n '1,120p' "$file"

Repository: openshift/microshift

Length of output: 4965

🌐 Web query:

Prometheus scrape_config tls_config server_name default behavior hostname verification when server_name not set

💡 Result:

In Prometheus scrape_configs.tls_config, server_name is the TLS “ServerName” used for both (1) certificate hostname verification and (2) sending an SNI value in the client handshake. If you do not set tls_config.server_name, Prometheus will not send an explicit ServerName based on that field; instead, it relies on the underlying Go TLS behavior. 1) When tls_config.server_name is set - Prometheus passes server_name into Go’s tls.Config as Config.ServerName, which Go uses to verify the hostname on the returned certificates (unless InsecureSkipVerify is true) and to include in the client handshake for virtual hosting (SNI), unless the name is an IP address. [1] - Prometheus’ own docs describe server_name as “ServerName extension to indicate the name of the server.” [2] 2) Default behavior when tls_config.server_name is NOT set - Prometheus leaves TLSConfig.ServerName unset/empty. - In Go’s crypto/tls client dial logic, if Config.ServerName is empty, it infers ServerName from the hostname portion of the address being dialed (the “hostname we’re connecting to”). [3] So, “default hostname verification” is effectively performed against the hostname part of the target address (e.g., the host in host:port), rather than being driven by tls_config.server_name. 3) Practical implication for TLS/SNI in Prometheus scraping - If your target’s certificate is issued to a DNS name (e.g., original-sni.example.com) but the Prometheus scrape target address is an IP or some other host name, certificate verification and/or SNI can fail unless you set tls_config.server_name to the expected DNS name. This issue is commonly observed in mTLS setups where curl with an explicit SNI works, but the equivalent Prometheus config fails when server_name doesn’t match what the server expects for SNI. [4] Key takeaways - tls_config.server_name missing => Go derives TLS ServerName from the target’s dial hostname, and that value is used for certificate hostname verification (unless insecure_skip_verify is true). [3][1] - tls_config.server_name set => that value is used for hostname verification and SNI. [1]

Citations:

1: https://pkg.go.dev/crypto/tls@go1.26.3

2: https://prometheus.io/docs/prometheus/latest/configuration/configuration/

3: https://github.com/golang/go/blob/master/src/crypto/tls/tls.go

4: tls_config.server_name documentation is misleading prometheus/prometheus#10868

🌐 Web query:

OpenShift service-ca CA cert SANs service DNS does it include pod IPs

💡 Result:

OpenShift service-ca (service serving certificate) does not include pod IPs in the CA/service-serving certificate SANs. What SANs it includes (DNS): the generated service serving certificates are “only valid for the internal service DNS name …svc” [1][2]. If the service is headless (no clusterIP set), the certificate also contains a wildcard subject in the format of “*…svc” [1][2]. So, for service-ca-issued service serving certs: - SANs include internal service DNS names (…svc, and for headless services also a wildcard …svc) [1][2]. - SANs do not include pod IPs (the docs describe validity scoped to service DNS/internal communications, not pod IP addressing) [1][2]. Note: separate from this, service-ca can inject the service CA bundle (service-ca.crt) into ConfigMaps/other resources; that is about trust distribution, not SAN contents of service serving certs [1][2].

Citations:

1: https://docs.okd.io/latest/security/certificates/service-serving-certificate.html

2: https://docs.redhat.com/en/documentation/openshift_container_platform/4.9/html/security_and_compliance/configuring-certificates

Set tls_config.server_name for endpoint-discovered HTTPS scrapes (avoid pod-IP TLS hostname verification failures)

File: packaging/observability/otelcol.d/microshift-metrics.yaml (10-45)

Both prometheus/kube_state_metrics and prometheus/metrics_server use kubernetes_sd_configs.role: endpoints, so the HTTPS targets will be the discovered endpoint addresses (typically pod IPs). With only ca_file set, Prometheus will use the target’s dial host (pod IP) as the TLS ServerName for hostname verification, while service-serving certs are issued for service DNS names (….svc) and not pod IPs—so these scrapes are likely to fail TLS hostname verification.

Suggested fix

prometheus/kube_state_metrics: config: scrape_configs: - job_name: kube-state-metrics scrape_interval: 30s scheme: https tls_config: ca_file: /var/lib/microshift/certs/service-ca/ca.crt + server_name: kube-state-metrics.openshift-monitoring.svc kubernetes_sd_configs: - kubeconfig_file: /var/lib/microshift/resources/observability-client/kubeconfig role: endpoints namespaces: names: [openshift-monitoring] prometheus/metrics_server: config: scrape_configs: - job_name: metrics-server scrape_interval: 30s scheme: https tls_config: ca_file: /var/lib/microshift/certs/service-ca/ca.crt + server_name: metrics-server.openshift-monitoring.svc kubernetes_sd_configs: - kubeconfig_file: /var/lib/microshift/resources/observability-client/kubeconfig role: endpoints namespaces: names: [openshift-monitoring]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packaging/observability/otelcol.d/microshift-metrics.yaml` around lines 10 - 27, The HTTPS scrapes discovered via kubernetes_sd_configs.role: endpoints will use pod IPs for TLS ServerName and fail hostname verification; update the TLS config for the prometheus/kube_state_metrics scrape (job_name: kube-state-metrics) and likewise for prometheus/metrics_server to include tls_config.server_name set to the service FQDN (for example kube-state-metrics.openshift-monitoring.svc) so certificate hostname verification uses the service DNS name instead of the pod IP; add tls_config.server_name alongside the existing ca_file entry under prometheus/kube_state_metrics (and the equivalent block for prometheus/metrics_server).

coderabbitai · 2026-05-29T00:45:10Z

+    # Maps release JSON key -> OCP release tag name
+    declare -A METRICS_JSON_MAP=(
+        ["metrics_server"]="metrics-server"
+        ["kube_state_metrics"]="kube-state-metrics"
+        ["node_exporter"]="prometheus-node-exporter"
+    )
+
+    for goarch in amd64 arm64; do
+        arch=${GOARCH_TO_UNAME_MAP["${goarch}"]:-noarch}
+
+        local release_file="${STAGING_DIR}/release_${goarch}.json"
+        local metrics_release_json="${REPOROOT}/assets/optional/metrics-server/release-metrics-${arch}.json"
+
+        local base_release
+        base_release=$(jq -r ".metadata.version" "${release_file}")
+        jq -n "{\"release\": {\"base\": \"$base_release\"}, \"images\": {}}" > "${metrics_release_json}"
+
+        # Update release-metrics-${arch}.json
+        for json_key in "${!METRICS_JSON_MAP[@]}"; do
+            local release_tag="${METRICS_JSON_MAP[$json_key]}"
+            local new_image
+            new_image=$(jq -r ".references.spec.tags[] | select(.name == \"${release_tag}\") | .from.name" "${release_file}")
+            yq -i -o json ".images += {\"${json_key}\": \"${new_image}\"}" "${metrics_release_json}"
+        done


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include kube-rbac-proxy in release-metrics-*.json.

METRICS_JSON_MAP omits kube-rbac-proxy, even though the kustomization rebasing logic handles it. That leaves microshift-metrics-release-info incomplete for disconnected/blueprint flows that rely on the generated image list.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/auto-rebase/rebase.sh` around lines 1136 - 1159, The METRICS_JSON_MAP associative array currently misses the kube-rbac-proxy entry, so update METRICS_JSON_MAP (declared as METRICS_JSON_MAP=( ... )) to include a mapping for the kube-rbac-proxy image (e.g., add ["kube_rbac_proxy"]="kube-rbac-proxy" or a consistent key name used elsewhere), so the for-loop that iterates json_key/release_tag/new_image will pick up and write kube-rbac-proxy into the generated release-metrics-${arch}.json files the same way as metrics_server/kube_state_metrics/node_exporter.

coderabbitai · 2026-05-29T00:45:10Z

+# RHEL 10+ requires UEFI boot
+BOOT_OPTS=""
+RHEL_MAJOR=$(echo "${VMNAME}" | grep -oP '(\d+)\.\d+' | cut -d. -f1)
+if [ "${RHEL_MAJOR:-0}" -ge 10 ]; then
+    BOOT_OPTS="--boot uefi"
+fi


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Default VM name format:" sed -n '40,44p' scripts/devenv-builder/manage-vm.sh echo echo "Current RHEL major detection:" sed -n '58,63p' scripts/devenv-builder/create-vm.sh echo echo "Regex result for a default RHEL 10 VM name:" printf '%s\n' 'microshift-user-host-10' | grep -oP '(\d+)\.\d+' || echo 'NO_MATCH'

Repository: openshift/microshift

Length of output: 470

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== create-vm.sh context (lines ~50-80) ===" nl -ba scripts/devenv-builder/create-vm.sh | sed -n '45,85p' echo echo "=== manage-vm.sh context (lines ~30-70) ===" nl -ba scripts/devenv-builder/manage-vm.sh | sed -n '30,80p' echo echo "=== Bash set -e + pipefail + failing grep in command substitution test ===" cat > /tmp/sete_cmdsub_test.sh <<'BASH' #!/usr/bin/env bash set -euo pipefail VMNAME="microshift-user-host-10" RHEL_MAJOR=$(echo "${VMNAME}" | grep -oP '(\d+)\.\d+' | cut -d. -f1) echo "RHEL_MAJOR=${RHEL_MAJOR}" BASH chmod +x /tmp/sete_cmdsub_test.sh /tmp/sete_cmdsub_test.sh || echo "Script exited non-zero (as expected if set -e aborts)"

Repository: openshift/microshift

Length of output: 169

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== create-vm.sh context (lines 45-85) ===" sed -n '45,85p' scripts/devenv-builder/create-vm.sh | cat -n echo echo "=== manage-vm.sh context (lines 30-80) ===" sed -n '30,80p' scripts/devenv-builder/manage-vm.sh | cat -n echo echo "=== Bash set -e + pipefail + failing grep in command substitution test ===" cat > /tmp/sete_cmdsub_test.sh <<'BASH' #!/usr/bin/env bash set -euo pipefail VMNAME="microshift-user-host-10" RHEL_MAJOR=$(echo "${VMNAME}" | grep -oP '(\d+)\.\d+' | cut -d. -f1) echo "RHEL_MAJOR=${RHEL_MAJOR}" BASH chmod +x /tmp/sete_cmdsub_test.sh /tmp/sete_cmdsub_test.sh || echo "Script exited non-zero"

Repository: openshift/microshift

Length of output: 3349

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== create-vm.sh lines 70-80 ===" sed -n '70,80p' scripts/devenv-builder/create-vm.sh | cat -n

Repository: openshift/microshift

Length of output: 589

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== create-vm.sh header (first 40 lines) ===" sed -n '1,40p' scripts/devenv-builder/create-vm.sh | cat -n echo echo "=== Find get_vm_name calls in manage-vm.sh ===" rg -n "get_vm_name" scripts/devenv-builder/manage-vm.sh echo echo "=== Show manage-vm.sh around VM name generation ===" rg -n "VMNAME|get_vm_name|rhel_version" scripts/devenv-builder/manage-vm.sh -n -S sed -n '1,160p' scripts/devenv-builder/manage-vm.sh | cat -n | sed -n '1,160p'

Repository: openshift/microshift

Length of output: 10398

🏁 Script executed:

#!/bin/bash sed -n '74p' scripts/devenv-builder/create-vm.sh | cat -A

Repository: openshift/microshift

Length of output: 84

🏁 Script executed:

#!/bin/bash set -euo pipefail # exact line numbers for the problematic block rg -n "RHEL_MAJOR=|BOOT_OPTS=|RHEL 10\+ requires UEFI boot" scripts/devenv-builder/create-vm.sh # show the full block with exact line numbers awk 'NR>=0{print}' scripts/devenv-builder/create-vm.sh | awk 'NR>=1 && NR<=120' >/dev/null

Repository: openshift/microshift

Length of output: 216

Fix RHEL boot detection to not parse ${VMNAME} with (\d+)\.\d+ under set -euo pipefail.

manage-vm.sh can set MICROSHIFT_RHEL_VERSION=10, producing VMNAME=...-10. In scripts/devenv-builder/create-vm.sh (lines 58-62), RHEL_MAJOR=$(... | grep -oP '(\d+)\.\d+' ...) then finds no match, and with set -e the script exits before virt-install, so BOOT_OPTS never becomes --boot uefi.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/devenv-builder/create-vm.sh` around lines 58 - 63, The RHEL major detection in create-vm.sh uses grep -P with a pattern that can fail under set -euo when VMNAME contains just "-10" (no dot), causing the script to exit and BOOT_OPTS to never be set; change the logic to robustly derive RHEL_MAJOR (prefer MICROSHIFT_RHEL_VERSION if present, otherwise extract digits from VMNAME) and avoid grep failing on no-match (e.g., use grep -oE '\d+' with a fallback like "|| true" or use pure shell parameter expansion to strip the suffix and parse digits), then compute RHEL_MAJOR=${RHEL_MAJOR%%.*} or cut -d. -f1 and keep the existing BOOT_OPTS assignment using that value.

openshift-ci · 2026-05-29T02:27:51Z

@copejon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-tests-bootc-el9	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-el9`
ci/prow/e2e-aws-tests-bootc-arm-el9	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-arm-el9`
ci/prow/e2e-aws-tests-bootc-periodic-el9	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-periodic-el9`
ci/prow/e2e-aws-tests-arm	`25ea1dc`	link	true	`/test e2e-aws-tests-arm`
ci/prow/e2e-aws-tests-bootc-periodic-arm-el10	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-periodic-arm-el10`
ci/prow/e2e-aws-tests-periodic	`25ea1dc`	link	true	`/test e2e-aws-tests-periodic`
ci/prow/e2e-aws-tests-bootc-periodic-el10	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-periodic-el10`
ci/prow/e2e-aws-tests	`25ea1dc`	link	true	`/test e2e-aws-tests`
ci/prow/e2e-aws-tests-bootc-el10	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-el10`
ci/prow/test-rebase	`25ea1dc`	link	false	`/test test-rebase`
ci/prow/verify	`25ea1dc`	link	true	`/test verify`
ci/prow/e2e-aws-tests-periodic-arm	`25ea1dc`	link	true	`/test e2e-aws-tests-periodic-arm`
ci/prow/e2e-aws-tests-bootc-arm-el10	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-arm-el10`
ci/prow/e2e-aws-tests-bootc-periodic-arm-el9	`25ea1dc`	link	true	`/test e2e-aws-tests-bootc-periodic-arm-el9`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Add kustomize manifests for metrics-server, kube-state-metrics, and node-exporter as optional components delivered via a single microshift-metrics RPM. Includes SCC RBAC grants (hostmount-anyuid for metrics-server, privileged for node-exporter), service-ca TLS integration, health check registration, and release-info packaging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hostPath cert volumes with Kubernetes Secrets/ConfigMaps to work under enforcing SELinux (container_t cannot read container_var_lib_t files). Generate a dedicated metrics-server-kubelet-client cert under the kube-apiserver-to-kubelet-signer and provision it into the openshift-monitoring namespace before kustomize applies the manifests. - Add metrics-server-kubelet-client cert to kube-apiserver-to-kubelet-signer chain - Add provisionMetricsServerCerts() to create Secret/ConfigMap from cert on disk - Replace hostPath volumes with Secret/ConfigMap references in deployment - Remove SCC use rule (no longer needs hostmount-anyuid) - Add system:metrics-server User subject to ClusterRoleBinding for kubelet auth - Prefer InternalIP over Hostname for kubelet address resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add drop-in config infrastructure so the otel-collector can scrape metrics exporters installed by the microshift-metrics RPM. The service file dynamically loads YAML configs from otelcol.d/ and the collector merges them with the base config. Move metrics-server cert provisioning to run asynchronously after kustomize, waiting for the namespace to exist via PollUntilContextTimeout before creating the kubelet client Secret and CA ConfigMap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add CMO (cluster-monitoring-operator) to the rebase infrastructure so metrics exporter manifests stay in sync with upstream during future rebases. CMO is cloned via a new OPTIONAL_COMPONENTS list and its manifests are copied using a handle_assets.py recipe. RBAC files (serviceaccount, clusterrole, clusterrolebinding, service) are taken directly from CMO. Deployments and namespace use git_restore to preserve MicroShift's single-node adaptations while surfacing upstream changes for human review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extend the microshift-metrics otel-collector drop-in to scrape kube-state-metrics and metrics-server in addition to node-exporter. Both new receivers use Kubernetes endpoints service discovery with service-CA TLS, targeting the openshift-monitoring namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add update_metrics_images() to rebase.sh to populate architecture- specific kustomization overlays and release-metrics JSON files from the OCP release payload. Covers metrics-server, kube-state-metrics, node-exporter, and kube-rbac-proxy images for both x86_64 and aarch64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…O upstream Rename all manifests from numeric prefixes to CMO-matching names for both metrics-server and node-exporter. Add missing CMO files verbatim (configmap-audit-profiles, network-policy, PDB, SCC, kube-rbac-proxy secret, accelerators configmap). Fix metrics-server deployment image placeholder from empty string to quay.io/openshift/metrics-server so the kustomize images transformer can match and inject the OCP digest. Replace git_restore with ignore + reason in the rebase recipe since they are functionally identical under no_clean and ignore communicates why MicroShift diverges from CMO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add type-grouped numeric prefixes to metrics-server manifest files to enforce correct deployment ordering via kustomize. Grouping: 00-namespace, 01-RBAC/SA, 02-configmap, 03-workload, 04-service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename KSM manifests from old sequential scheme (01-serviceaccount through 05-service) to type-grouped CMO filenames with prefixes: 01-RBAC/SA, 02-configmap/secret, 03-workload, 04-service. Add two new CMO files: kube-rbac-proxy-secret and custom-resource-state configmap. Update spec install loop for new naming pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add type-grouped numeric prefixes to node-exporter manifest files: 01-RBAC/SA/SCC, 02-configmap/secret, 03-workload, 04-service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Multiple metrics exporters deploy to the openshift-monitoring namespace. The previous code overwrote the workloads map entry on each iteration, so only the last component's workloads were checked. Add mergeWorkloads() to append workloads from all components sharing a namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@assets/optional/kube-state-metrics/01-cluster-role.yaml`:
- Around line 12-30: Remove cluster-wide secrets access from the ClusterRole by
deleting the "secrets" entry from the resources list in the Role/ClusterRole
definition (the block that currently lists configmaps, secrets, nodes, pods,
services, serviceaccounts, ...). Edit the resources array so it no longer
includes "secrets" (leaving other resources intact) to enforce least-privilege
RBAC for the kube-state-metrics exporter.

In `@assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml`:
- Around line 417-470: The HELP texts for the metrics
verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu and
verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_memory
are incorrect — they say "Minimum" but the metric exposes maxAllowed; update
both help strings to refer to "Maximum" (e.g., "Maximum cpu resources..." and
"Maximum memory resources...") so the Prometheus series are labeled correctly;
locate the help fields adjacent to those metric names and change the wording
accordingly.

In `@assets/optional/kube-state-metrics/03-deployment.yaml`:
- Around line 64-68: Update the pod-level securityContext and each container's
securityContext in the Deployment manifest (the pod spec and the three container
blocks shown around the empty securityContext entries) to enforce non-root and
hardened settings: set pod.securityContext.runAsNonRoot: true and a non-root
runAsUser (e.g., 1000) if not already set; for each container set
securityContext.runAsNonRoot: true, runAsUser consistent with the pod,
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, privileged:
false, and capabilities.drop: ["ALL"] (and add only any minimal capabilities
required by that specific container); also add a seccompProfile: { type:
RuntimeDefault } at the pod or container level. Ensure these changes are applied
to the three container blocks and the pod-level securityContext referred to in
the diff.
- Around line 35-126: Add liveness and readiness probes plus CPU/memory limits
for each container: for the kube-state-metrics container (name:
kube-state-metrics) add an HTTPGet liveness and readiness probe targeting the
exporter on localhost:8081 (or the telemetry port 8082 if appropriate) with
sensible initialDelay/period/timeout, and add resources.limits.cpu and
resources.limits.memory alongside the existing requests; for the
kube-rbac-proxy-main and kube-rbac-proxy-self containers (names:
kube-rbac-proxy-main and kube-rbac-proxy-self) add liveness/readiness probes
(HTTPGet or TCP) pointing at their container ports 8443 and 9443 respectively
(use /healthz or /readyz or /metrics as appropriate) with sensible timing
values, and add matching resources.limits.cpu and resources.limits.memory for
each container to bound resource usage. Ensure probe fields and limits are added
under each container spec where args, image, name, and resources exist.

In `@assets/optional/node-exporter/03-daemonset.yaml`:
- Line 129: The DaemonSet sets hostPID: true which unnecessarily exposes the
host PID namespace; remove the hostPID: true line from the DaemonSet spec to
follow the guideline "No hostPID, hostNetwork, hostIPC, privileged: true", and
verify the exporter still works using the existing mounts (/host/root/proc and
/host/sys); only reintroduce hostPID if a specific collector explicitly requires
host PID visibility and document that requirement next to the hostPID setting.
- Around line 144-146: The init container "init-textfile" is unnecessarily
running privileged and as root; update its securityContext to remove privileged:
true and runAsUser: 0, and instead set runAsNonRoot: true, runAsUser to a
non-root UID (e.g., 1000) or remove runAsUser so the image defaults to non-root,
set readOnlyRootFilesystem: true, and allowPrivilegeEscalation: false; ensure no
hostPID/hostNetwork/hostIPC are enabled for that container and retain only the
minimal mounts (the emptyDir textfile and the read-only /var/log/wtmp) so the
init container is least-privileged.
- Around line 65-69: The node-exporter container currently has an empty
securityContext and no resource limits; update the container spec for
"node-exporter" to explicitly set securityContext: runAsNonRoot: true,
readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and drop all
capabilities (capabilities: { drop: ["ALL"] }) then add back only the minimal
required capability if needed (e.g., NET_RAW) via add. Also add resource limits
to match requests (set resources.limits.cpu and resources.limits.memory
alongside the existing resources.requests.cpu and resources.requests.memory).
Ensure these changes are applied to the container entry named "node-exporter" in
the DaemonSet manifest.
- Around line 88-99: The kube-rbac-proxy daemonset currently sets
--secure-listen-address=[$(IP)]:9100 (and uses env name IP) while running with
hostNetwork: true, which binds to the node IP and is not reachable by the
collector that scrapes 127.0.0.1:9100; change the proxy to bind to loopback
instead or make the collector target the node IP. Concretely, in the DaemonSet
arguments for kube-rbac-proxy (the --secure-listen-address entry and the env
name: IP usage) replace binding to [$ (IP)] with 127.0.0.1:9100 (or remove the
envRef usage) so the proxy listens on localhost, or alternatively update the
collector scrape target to use the pod/node IP; apply the same fix for the
duplicated instance referenced around the second occurrence (lines noted in the
comment).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 26026edc-818b-432f-8470-dab3e234aa07

📥 Commits

Reviewing files that changed from the base of the PR and between 25ea1dc and 5843fac.

📒 Files selected for processing (34)

assets/components/metrics-server/kubelet-ca-configmap.yaml
assets/components/metrics-server/kubelet-client-secret.yaml
assets/optional/kube-state-metrics/01-cluster-role-binding.yaml
assets/optional/kube-state-metrics/01-cluster-role.yaml
assets/optional/kube-state-metrics/01-service-account.yaml
assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml
assets/optional/kube-state-metrics/02-kube-rbac-proxy-secret.yaml
assets/optional/kube-state-metrics/03-deployment.yaml
assets/optional/kube-state-metrics/04-service.yaml
assets/optional/kube-state-metrics/kustomization.yaml
assets/optional/metrics-server/01-cluster-role-binding-auth-delegator.yaml
assets/optional/metrics-server/01-cluster-role-binding.yaml
assets/optional/metrics-server/01-cluster-role.yaml
assets/optional/metrics-server/01-role-binding-auth-reader.yaml
assets/optional/metrics-server/01-service-account.yaml
assets/optional/metrics-server/02-configmap-audit-profiles.yaml
assets/optional/metrics-server/03-deployment.yaml
assets/optional/metrics-server/04-api-service.yaml
assets/optional/metrics-server/04-service.yaml
assets/optional/metrics-server/kustomization.yaml
assets/optional/metrics-server/network-policy-downstream.yaml
assets/optional/metrics-server/pod-disruption-budget.yaml
assets/optional/node-exporter/01-cluster-role-binding.yaml
assets/optional/node-exporter/01-cluster-role.yaml
assets/optional/node-exporter/01-security-context-constraints.yaml
assets/optional/node-exporter/01-service-account.yaml
assets/optional/node-exporter/02-accelerators-collector-configmap.yaml
assets/optional/node-exporter/02-kube-rbac-proxy-secret.yaml
assets/optional/node-exporter/03-daemonset.yaml
assets/optional/node-exporter/04-service.yaml
assets/optional/node-exporter/kustomization.yaml
packaging/rpm/microshift.spec
pkg/healthcheck/microshift_optional_workloads.go
scripts/auto-rebase/assets_metrics.yaml

✅ Files skipped from review due to trivial changes (12)

assets/optional/kube-state-metrics/01-cluster-role-binding.yaml
assets/optional/metrics-server/01-cluster-role-binding.yaml
assets/optional/metrics-server/01-cluster-role.yaml
assets/optional/metrics-server/network-policy-downstream.yaml
assets/optional/metrics-server/01-service-account.yaml
assets/optional/metrics-server/04-service.yaml
assets/components/metrics-server/kubelet-client-secret.yaml
assets/optional/node-exporter/01-security-context-constraints.yaml
assets/optional/node-exporter/01-service-account.yaml
assets/optional/kube-state-metrics/01-service-account.yaml
assets/optional/metrics-server/03-deployment.yaml
assets/optional/kube-state-metrics/04-service.yaml

🚧 Files skipped from review as they are similar to previous changes (3)

assets/optional/kube-state-metrics/kustomization.yaml
assets/optional/metrics-server/kustomization.yaml
packaging/rpm/microshift.spec

coderabbitai · 2026-06-03T20:24:48Z

+- apiGroups:
+  - ""
+  resources:
+  - configmaps
+  - secrets
+  - nodes
+  - pods
+  - services
+  - serviceaccounts
+  - resourcequotas
+  - replicationcontrollers
+  - limitranges
+  - persistentvolumeclaims
+  - persistentvolumes
+  - namespaces
+  - endpoints
+  verbs:
+  - list
+  - watch


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop cluster-wide secrets access from this role.

Lines 15-16 let this exporter list/watch every Secret in the cluster. That exposes full secret payloads if the pod or service account is compromised, which is a much larger blast radius than the rest of this read-only metrics role needs.

Suggested fix

- apiGroups: - "" resources: - configmaps - - secrets - nodes - pods - services

As per coding guidelines, `RBAC: least privilege; no cluster-admin for workloads`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- apiGroups:

- ""

resources:

- configmaps

- secrets

- nodes

- pods

- services

- serviceaccounts

- resourcequotas

- replicationcontrollers

- limitranges

- persistentvolumeclaims

- persistentvolumes

- namespaces

- endpoints

verbs:

- list

- watch

- apiGroups:

- ""

resources:

- configmaps

- nodes

- pods

- services

- serviceaccounts

- resourcequotas

- replicationcontrollers

- limitranges

- persistentvolumeclaims

- persistentvolumes

- namespaces

- endpoints

verbs:

- list

- watch

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/kube-state-metrics/01-cluster-role.yaml` around lines 12 - 30, Remove cluster-wide secrets access from the ClusterRole by deleting the "secrets" entry from the resources list in the Role/ClusterRole definition (the block that currently lists configmaps, secrets, nodes, pods, services, serviceaccounts, ...). Edit the resources array so it no longer includes "secrets" (leaving other resources intact) to enforce least-privilege RBAC for the kube-state-metrics exporter.

coderabbitai · 2026-06-03T20:24:49Z

+        - "commonLabels":
+            "resource": "cpu"
+            "unit": "cores"
+          "each":
+            "gauge":
+              "labelsFromPath":
+                "container":
+                - "containerName"
+              "path":
+              - "spec"
+              - "resourcePolicy"
+              - "containerPolicies"
+              "valueFrom":
+              - "maxAllowed"
+              - "cpu"
+            "type": "Gauge"
+          "help": "Minimum cpu resources the VerticalPodAutoscaler can set for containers matching the name."
+          "labelsFromPath":
+            "namespace":
+            - "metadata"
+            - "namespace"
+            "target_api_version":
+            - "spec"
+            - "targetRef"
+            - "apiVersion"
+            "target_kind":
+            - "spec"
+            - "targetRef"
+            - "kind"
+            "target_name":
+            - "spec"
+            - "targetRef"
+            - "name"
+            "verticalpodautoscaler":
+            - "metadata"
+            - "name"
+          "name": "verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu"
+        - "commonLabels":
+            "resource": "memory"
+            "unit": "bytes"
+          "each":
+            "gauge":
+              "labelsFromPath":
+                "container":
+                - "containerName"
+              "path":
+              - "spec"
+              - "resourcePolicy"
+              - "containerPolicies"
+              "valueFrom":
+              - "maxAllowed"
+              - "memory"
+            "type": "Gauge"
+          "help": "Minimum memory resources the VerticalPodAutoscaler can set for containers matching the name."


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the maxAllowed HELP text.

Lines 433 and 470 still say "Minimum" even though these metrics expose maxAllowed. That mislabels the series in Prometheus output.

Suggested fix

- "help": "Minimum cpu resources the VerticalPodAutoscaler can set for containers matching the name." + "help": "Maximum cpu resources the VerticalPodAutoscaler can set for containers matching the name." ... - "help": "Minimum memory resources the VerticalPodAutoscaler can set for containers matching the name." + "help": "Maximum memory resources the VerticalPodAutoscaler can set for containers matching the name."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- "commonLabels":

"resource": "cpu"

"unit": "cores"

"each":

"gauge":

"labelsFromPath":

"container":

- "containerName"

"path":

- "spec"

- "resourcePolicy"

- "containerPolicies"

"valueFrom":

- "maxAllowed"

- "cpu"

"type": "Gauge"

"help": "Minimum cpu resources the VerticalPodAutoscaler can set for containers matching the name."

"labelsFromPath":

"namespace":

- "metadata"

- "namespace"

"target_api_version":

- "spec"

- "targetRef"

- "apiVersion"

"target_kind":

- "spec"

- "targetRef"

- "kind"

"target_name":

- "spec"

- "targetRef"

- "name"

"verticalpodautoscaler":

- "metadata"

- "name"

"name": "verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu"

- "commonLabels":

"resource": "memory"

"unit": "bytes"

"each":

"gauge":

"labelsFromPath":

"container":

- "containerName"

"path":

- "spec"

- "resourcePolicy"

- "containerPolicies"

"valueFrom":

- "maxAllowed"

- "memory"

"type": "Gauge"

"help": "Minimum memory resources the VerticalPodAutoscaler can set for containers matching the name."

- "commonLabels":

"resource": "cpu"

"unit": "cores"

"each":

"gauge":

"labelsFromPath":

"container":

- "containerName"

"path":

- "spec"

- "resourcePolicy"

- "containerPolicies"

"valueFrom":

- "maxAllowed"

- "cpu"

"type": "Gauge"

"help": "Maximum cpu resources the VerticalPodAutoscaler can set for containers matching the name."

"labelsFromPath":

"namespace":

- "metadata"

- "namespace"

"target_api_version":

- "spec"

- "targetRef"

- "apiVersion"

"target_kind":

- "spec"

- "targetRef"

- "kind"

"target_name":

- "spec"

- "targetRef"

- "name"

"verticalpodautoscaler":

- "metadata"

- "name"

"name": "verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu"

- "commonLabels":

"resource": "memory"

"unit": "bytes"

"each":

"gauge":

"labelsFromPath":

"container":

- "containerName"

"path":

- "spec"

- "resourcePolicy"

- "containerPolicies"

"valueFrom":

- "maxAllowed"

- "memory"

"type": "Gauge"

"help": "Maximum memory resources the VerticalPodAutoscaler can set for containers matching the name."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml` around lines 417 - 470, The HELP texts for the metrics verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu and verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_memory are incorrect — they say "Minimum" but the metric exposes maxAllowed; update both help strings to refer to "Maximum" (e.g., "Maximum cpu resources..." and "Maximum memory resources...") so the Prometheus series are labeled correctly; locate the help fields adjacent to those metric names and change the wording accordingly.

coderabbitai · 2026-06-03T20:24:49Z

+      containers:
+      - args:
+        - --host=127.0.0.1
+        - --port=8081
+        - --telemetry-host=127.0.0.1
+        - --telemetry-port=8082
+        - --custom-resource-state-config-file=/etc/kube-state-metrics/custom-resource-state-configmap.yaml
+        - |
+          --metric-denylist=
+          ^kube_secret_labels$,
+          ^kube_.+_annotations$,
+          ^kube_customresource_.+_annotations_info$,
+          ^kube_customresource_.+_labels_info$
+        - --metric-labels-allowlist=pods=[*],nodes=[*],namespaces=[*],persistentvolumes=[*],persistentvolumeclaims=[*],poddisruptionbudgets=[*]
+        - |
+          --metric-denylist=
+          ^kube_.+_created$,
+          ^kube_.+_metadata_resource_version$,
+          ^kube_replicaset_metadata_generation$,
+          ^kube_replicaset_status_observed_generation$,
+          ^kube_pod_restart_policy$,
+          ^kube_pod_init_container_status_terminated$,
+          ^kube_pod_init_container_status_running$,
+          ^kube_pod_container_status_terminated$,
+          ^kube_pod_container_status_running$,
+          ^kube_pod_completion_time$,
+          ^kube_pod_status_scheduled$
+        image: quay.io/openshift/kube-state-metrics
+        name: kube-state-metrics
+        resources:
+          requests:
+            cpu: 2m
+            memory: 80Mi
+        securityContext: {}
+        terminationMessagePolicy: FallbackToLogsOnError
+        volumeMounts:
+        - mountPath: /tmp
+          name: volume-directive-shadow
+          readOnly: false
+        - mountPath: /etc/kube-state-metrics
+          name: kube-state-metrics-custom-resource-state-configmap
+          readOnly: true
+      - args:
+        - --secure-listen-address=:8443
+        - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
+        - --upstream=http://127.0.0.1:8081/
+        - --tls-cert-file=/etc/tls/private/tls.crt
+        - --tls-private-key-file=/etc/tls/private/tls.key
+        - --config-file=/etc/kube-rbac-policy/config.yaml
+        image: quay.io/openshift/kube-rbac-proxy
+        name: kube-rbac-proxy-main
+        ports:
+        - containerPort: 8443
+          name: https-main
+        resources:
+          requests:
+            cpu: 1m
+            memory: 15Mi
+        securityContext: {}
+        terminationMessagePolicy: FallbackToLogsOnError
+        volumeMounts:
+        - mountPath: /etc/tls/private
+          name: kube-state-metrics-tls
+          readOnly: false
+        - mountPath: /etc/kube-rbac-policy
+          name: kube-state-metrics-kube-rbac-proxy-config
+          readOnly: true
+      - args:
+        - --secure-listen-address=:9443
+        - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
+        - --upstream=http://127.0.0.1:8082/
+        - --tls-cert-file=/etc/tls/private/tls.crt
+        - --tls-private-key-file=/etc/tls/private/tls.key
+        - --config-file=/etc/kube-rbac-policy/config.yaml
+        image: quay.io/openshift/kube-rbac-proxy
+        name: kube-rbac-proxy-self
+        ports:
+        - containerPort: 9443
+          name: https-self
+        resources:
+          requests:
+            cpu: 1m
+            memory: 15Mi
+        securityContext: {}
+        terminationMessagePolicy: FallbackToLogsOnError
+        volumeMounts:
+        - mountPath: /etc/tls/private
+          name: kube-state-metrics-tls
+          readOnly: false
+        - mountPath: /etc/kube-rbac-policy
+          name: kube-state-metrics-kube-rbac-proxy-config
+          readOnly: true


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add probes and resource limits for all three containers.

None of the containers defines liveness/readiness probes or cpu/memory limits. That means a hung exporter/proxy can stay in service indefinitely, and resource growth is left unbounded.

As per coding guidelines, "Resource limits (cpu, memory) on every container" and "Liveness + readiness probes defined".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/kube-state-metrics/03-deployment.yaml` around lines 35 - 126, Add liveness and readiness probes plus CPU/memory limits for each container: for the kube-state-metrics container (name: kube-state-metrics) add an HTTPGet liveness and readiness probe targeting the exporter on localhost:8081 (or the telemetry port 8082 if appropriate) with sensible initialDelay/period/timeout, and add resources.limits.cpu and resources.limits.memory alongside the existing requests; for the kube-rbac-proxy-main and kube-rbac-proxy-self containers (names: kube-rbac-proxy-main and kube-rbac-proxy-self) add liveness/readiness probes (HTTPGet or TCP) pointing at their container ports 8443 and 9443 respectively (use /healthz or /readyz or /metrics as appropriate) with sensible timing values, and add matching resources.limits.cpu and resources.limits.memory for each container to bound resource usage. Ensure probe fields and limits are added under each container spec where args, image, name, and resources exist.

coderabbitai · 2026-06-03T20:24:49Z

+        resources:
+          requests:
+            cpu: 2m
+            memory: 80Mi
+        securityContext: {}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit container hardening here.

All three containers and the pod use empty securityContexts, so this manifest is missing the required non-root, no-privilege-escalation, dropped-capabilities, read-only-rootfs, and seccomp settings. Relying on SCC defaults leaves this optional component with a weaker security contract than the repo guidelines require.

As per coding guidelines, "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false" and "Drop ALL capabilities, add only what is required".

Also applies to: 89-93, 114-118, 130-130

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/kube-state-metrics/03-deployment.yaml` around lines 64 - 68, Update the pod-level securityContext and each container's securityContext in the Deployment manifest (the pod spec and the three container blocks shown around the empty securityContext entries) to enforce non-root and hardened settings: set pod.securityContext.runAsNonRoot: true and a non-root runAsUser (e.g., 1000) if not already set; for each container set securityContext.runAsNonRoot: true, runAsUser consistent with the pod, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, privileged: false, and capabilities.drop: ["ALL"] (and add only any minimal capabilities required by that specific container); also add a seccompProfile: { type: RuntimeDefault } at the pod or container level. Ensure these changes are applied to the three container blocks and the pod-level securityContext referred to in the diff.

coderabbitai · 2026-06-03T20:24:49Z

+        resources:
+          requests:
+            cpu: 8m
+            memory: 32Mi
+        securityContext: {}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden the main exporter container too.

node-exporter is the only long-lived container here with an empty securityContext and no resource limits, even though it is the container with direct host mounts. That leaves the most exposed container dependent on defaults instead of an explicit manifest contract.

As per coding guidelines, "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false", "Drop ALL capabilities, add only what is required", and "Resource limits (cpu, memory) on every container".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/node-exporter/03-daemonset.yaml` around lines 65 - 69, The node-exporter container currently has an empty securityContext and no resource limits; update the container spec for "node-exporter" to explicitly set securityContext: runAsNonRoot: true, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and drop all capabilities (capabilities: { drop: ["ALL"] }) then add back only the minimal required capability if needed (e.g., NET_RAW) via add. Also add resource limits to match requests (set resources.limits.cpu and resources.limits.memory alongside the existing resources.requests.cpu and resources.requests.memory). Ensure these changes are applied to the container entry named "node-exporter" in the DaemonSet manifest.

coderabbitai · 2026-06-03T20:24:49Z

+          name: node-exporter-kube-rbac-proxy-config
+          readOnly: true
+      hostNetwork: true
+      hostPID: true


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop hostPID unless a collector really needs the host PID namespace.

The exporter is already pointed at /host/root/proc and /host/sys, so the host PID namespace does not appear necessary for the behavior shown here. Keeping it enabled expands process visibility for no clear benefit.

As per coding guidelines, "No hostPID, hostNetwork, hostIPC, privileged: true".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/node-exporter/03-daemonset.yaml` at line 129, The DaemonSet sets hostPID: true which unnecessarily exposes the host PID namespace; remove the hostPID: true line from the DaemonSet spec to follow the guideline "No hostPID, hostNetwork, hostIPC, privileged: true", and verify the exporter still works using the existing mounts (/host/root/proc and /host/sys); only reintroduce hostPID if a specific collector explicitly requires host PID visibility and document that requirement next to the hostPID setting.

coderabbitai · 2026-06-03T20:24:49Z

+        securityContext:
+          privileged: true
+          runAsUser: 0


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The init container is over-privileged for the mounts shown here.

init-textfile only gets the textfile emptyDir plus a read-only /var/log/wtmp mount, but it still runs privileged as UID 0. That is a much broader permission set than the manifest visibly requires.

As per coding guidelines, "No hostPID, hostNetwork, hostIPC, privileged: true" and "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/node-exporter/03-daemonset.yaml` around lines 144 - 146, The init container "init-textfile" is unnecessarily running privileged and as root; update its securityContext to remove privileged: true and runAsUser: 0, and instead set runAsNonRoot: true, runAsUser to a non-root UID (e.g., 1000) or remove runAsUser so the image defaults to non-root, set readOnlyRootFilesystem: true, and allowPrivilegeEscalation: false; ensure no hostPID/hostNetwork/hostIPC are enabled for that container and retain only the minimal mounts (the emptyDir textfile and the read-only /var/log/wtmp) so the init container is least-privileged.

coderabbitai

🧹 Nitpick comments (2)

assets/optional/kube-state-metrics/03-deployment.yaml (1)

96-98: ⚡ Quick win

TLS secret mount should be read-only.

The kube-state-metrics-tls secret volume is mounted with readOnly: false. Secrets should be mounted read-only unless modification is required.

Suggested fix

        volumeMounts:
        - mountPath: /etc/tls/private
          name: kube-state-metrics-tls
-          readOnly: false
+          readOnly: true

Also applies to line 121-123.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/kube-state-metrics/03-deployment.yaml` around lines 96 - 98,
The TLS secret volume for kube-state-metrics is mounted with readOnly: false;
change the volumeMount for name: kube-state-metrics-tls (mountPath:
/etc/tls/private) to readOnly: true, and make the same change for the other
kube-state-metrics-tls mount instance referenced later in the file so both
secret mounts are read-only.

assets/optional/node-exporter/03-daemonset.yaml (1)

121-124: ⚡ Quick win

TLS secret mount should be read-only.

Same as kube-state-metrics: the TLS secret is mounted with readOnly: false.

Suggested fix

        volumeMounts:
        - mountPath: /etc/tls/private
          name: node-exporter-tls
-          readOnly: false
+          readOnly: true

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@assets/optional/node-exporter/03-daemonset.yaml` around lines 121 - 124, The
TLS secret volumeMount for the node-exporter container is currently writable;
change the volumeMount entry named "node-exporter-tls" (mountPath
"/etc/tls/private") to be readOnly: true so the secret cannot be modified at
runtime; update the volumeMount under the container spec where "volumeMounts"
contains name: node-exporter-tls and mountPath: /etc/tls/private to set
readOnly: true.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@assets/optional/kube-state-metrics/03-deployment.yaml`:
- Around line 96-98: The TLS secret volume for kube-state-metrics is mounted
with readOnly: false; change the volumeMount for name: kube-state-metrics-tls
(mountPath: /etc/tls/private) to readOnly: true, and make the same change for
the other kube-state-metrics-tls mount instance referenced later in the file so
both secret mounts are read-only.

In `@assets/optional/node-exporter/03-daemonset.yaml`:
- Around line 121-124: The TLS secret volumeMount for the node-exporter
container is currently writable; change the volumeMount entry named
"node-exporter-tls" (mountPath "/etc/tls/private") to be readOnly: true so the
secret cannot be modified at runtime; update the volumeMount under the container
spec where "volumeMounts" contains name: node-exporter-tls and mountPath:
/etc/tls/private to set readOnly: true.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 79ac37c5-0900-42aa-b000-f7209d48285c

📥 Commits

Reviewing files that changed from the base of the PR and between 5843fac and e53c5d4.

📒 Files selected for processing (54)

assets/components/metrics-server/kubelet-ca-configmap.yaml
assets/components/metrics-server/kubelet-client-secret.yaml
assets/optional/kube-state-metrics/01-cluster-role-binding.yaml
assets/optional/kube-state-metrics/01-cluster-role.yaml
assets/optional/kube-state-metrics/01-service-account.yaml
assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml
assets/optional/kube-state-metrics/02-kube-rbac-proxy-secret.yaml
assets/optional/kube-state-metrics/03-deployment.yaml
assets/optional/kube-state-metrics/04-service.yaml
assets/optional/kube-state-metrics/kustomization.aarch64.yaml
assets/optional/kube-state-metrics/kustomization.x86_64.yaml
assets/optional/kube-state-metrics/kustomization.yaml
assets/optional/metrics-server/00-namespace.yaml
assets/optional/metrics-server/01-cluster-role-binding-auth-delegator.yaml
assets/optional/metrics-server/01-cluster-role-binding.yaml
assets/optional/metrics-server/01-cluster-role.yaml
assets/optional/metrics-server/01-role-binding-auth-reader.yaml
assets/optional/metrics-server/01-service-account.yaml
assets/optional/metrics-server/02-configmap-audit-profiles.yaml
assets/optional/metrics-server/03-deployment.yaml
assets/optional/metrics-server/04-api-service.yaml
assets/optional/metrics-server/04-service.yaml
assets/optional/metrics-server/kustomization.aarch64.yaml
assets/optional/metrics-server/kustomization.x86_64.yaml
assets/optional/metrics-server/kustomization.yaml
assets/optional/metrics-server/network-policy-downstream.yaml
assets/optional/metrics-server/pod-disruption-budget.yaml
assets/optional/metrics-server/release-metrics-aarch64.json
assets/optional/metrics-server/release-metrics-x86_64.json
assets/optional/node-exporter/01-cluster-role-binding.yaml
assets/optional/node-exporter/01-cluster-role.yaml
assets/optional/node-exporter/01-security-context-constraints.yaml
assets/optional/node-exporter/01-service-account.yaml
assets/optional/node-exporter/02-accelerators-collector-configmap.yaml
assets/optional/node-exporter/02-kube-rbac-proxy-secret.yaml
assets/optional/node-exporter/03-daemonset.yaml
assets/optional/node-exporter/04-service.yaml
assets/optional/node-exporter/kustomization.aarch64.yaml
assets/optional/node-exporter/kustomization.x86_64.yaml
assets/optional/node-exporter/kustomization.yaml
packaging/observability/microshift-observability.service
packaging/observability/otelcol.d/microshift-metrics.yaml
packaging/rpm/microshift.spec
pkg/cmd/init.go
pkg/cmd/metrics.go
pkg/cmd/run.go
pkg/healthcheck/microshift_optional_workloads.go
pkg/util/cryptomaterial/certinfo.go
scripts/auto-rebase/assets_metrics.yaml
scripts/auto-rebase/rebase.sh
scripts/devenv-builder/config/kickstart.ks.template
scripts/devenv-builder/create-vm.sh
scripts/devenv-builder/manage-vm.sh
test/bin/common.sh

✅ Files skipped from review due to trivial changes (8)

assets/optional/node-exporter/kustomization.aarch64.yaml
assets/optional/metrics-server/release-metrics-x86_64.json
assets/optional/metrics-server/01-cluster-role-binding-auth-delegator.yaml
assets/optional/metrics-server/04-api-service.yaml
assets/components/metrics-server/kubelet-ca-configmap.yaml
assets/optional/metrics-server/02-configmap-audit-profiles.yaml
assets/optional/node-exporter/kustomization.x86_64.yaml
assets/optional/kube-state-metrics/kustomization.x86_64.yaml

🚧 Files skipped from review as they are similar to previous changes (39)

assets/components/metrics-server/kubelet-client-secret.yaml
pkg/cmd/init.go
pkg/cmd/run.go
assets/optional/metrics-server/01-cluster-role-binding.yaml
assets/optional/metrics-server/kustomization.x86_64.yaml
assets/optional/node-exporter/01-security-context-constraints.yaml
assets/optional/metrics-server/00-namespace.yaml
assets/optional/metrics-server/01-service-account.yaml
assets/optional/kube-state-metrics/02-kube-rbac-proxy-secret.yaml
assets/optional/node-exporter/01-service-account.yaml
assets/optional/kube-state-metrics/01-cluster-role-binding.yaml
assets/optional/node-exporter/02-accelerators-collector-configmap.yaml
assets/optional/node-exporter/04-service.yaml
assets/optional/metrics-server/01-cluster-role.yaml
assets/optional/kube-state-metrics/04-service.yaml
assets/optional/metrics-server/network-policy-downstream.yaml
assets/optional/kube-state-metrics/01-service-account.yaml
assets/optional/metrics-server/04-service.yaml
assets/optional/kube-state-metrics/kustomization.aarch64.yaml
assets/optional/kube-state-metrics/kustomization.yaml
packaging/observability/otelcol.d/microshift-metrics.yaml
assets/optional/metrics-server/kustomization.yaml
scripts/devenv-builder/manage-vm.sh
assets/optional/node-exporter/02-kube-rbac-proxy-secret.yaml
assets/optional/kube-state-metrics/01-cluster-role.yaml
assets/optional/metrics-server/03-deployment.yaml
packaging/observability/microshift-observability.service
pkg/util/cryptomaterial/certinfo.go
assets/optional/node-exporter/kustomization.yaml
pkg/cmd/metrics.go
assets/optional/metrics-server/pod-disruption-budget.yaml
scripts/auto-rebase/assets_metrics.yaml
assets/optional/node-exporter/01-cluster-role.yaml
assets/optional/metrics-server/kustomization.aarch64.yaml
assets/optional/node-exporter/01-cluster-role-binding.yaml
pkg/healthcheck/microshift_optional_workloads.go
assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml
packaging/rpm/microshift.spec
scripts/auto-rebase/rebase.sh

Split the single microshift-metrics RPM into three independently installable packages to satisfy the USHIFT-6890 requirement that each exporter be enabled individually: - microshift-metrics-server - microshift-metrics-kube-state-metrics - microshift-metrics-node-exporter All three can be installed together with: dnf install microshift-metrics-* Split the combined otel-collector drop-in config into per-package configs so each RPM ships its own scrape configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 29, 2026

openshift-ci Bot requested review from agullon and pacevedom May 29, 2026 00:32

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026

coderabbitai Bot requested changes May 29, 2026

View reviewed changes

copejon marked this pull request as draft May 29, 2026 04:33

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2026

copejon and others added 11 commits June 3, 2026 15:08

USHIFT-6951: Add numeric prefixes to node-exporter manifests

167876e

Add type-grouped numeric prefixes to node-exporter manifest files: 01-RBAC/SA/SCC, 02-configmap/secret, 03-workload, 04-service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

copejon force-pushed the ushift-6951-metrics-exporters branch from 5843fac to e53c5d4 Compare June 3, 2026 20:08

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Conversation

copejon commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-ci-robot commented May 29, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error, 2 warnings)

Uh oh!

openshift-ci Bot commented May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

copejon commented May 29, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented May 29, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading